AE Rodriguez & Mo Cayer
March 1, 2019
Watson Analytics Sample Data: HR Employee Data Attrition & Performance
OBJECTIVES
ACME Inc. has an attrition problem. You have been asked to find out why, how to stem it, and to identify those employees inclined to leave.
** Objective 1: Identify the “reasons” underscoring the employee exodus by developing a machine learning algorithm **
+ We will be able to explore important questions such as:
+ "show me a breakdown of distance from home by job role and attrition."
+ "Compare average monthly income by education and attrition'.
+ We will be able to identify the most imporant reasons behind
and employees daparture.
** Objective 2: Use the model to “stem the tide” of employees heading for the exits **
[1] "Age" "Attrition"
[3] "BusinessTravel" "DailyRate"
[5] "Department" "DistanceFromHome"
[7] "Education" "EducationField"
[9] "EnvironmentSatisfaction" "Gender"
[11] "HourlyRate" "JobInvolvement"
[13] "JobLevel" "JobRole"
[15] "JobSatisfaction" "MaritalStatus"
[17] "MonthlyIncome" "MonthlyRate"
[19] "NumCompaniesWorked" "OverTime"
[21] "PercentSalaryHike" "PerformanceRating"
[23] "RelationshipSatisfaction" "StockOptionLevel"
[25] "TotalWorkingYears" "TrainingTimesLastYear"
[27] "WorkLifeBalance" "YearsAtCompany"
[29] "YearsInCurrentRole" "YearsSinceLastPromotion"
[31] "YearsWithCurrManager"
Variables in DataSet, Examples
Attrition Education EducationField EnvironmentSatisfaction
1 Yes College Life_Sciences Medium
2 No Below_College Life_Sciences High
4 Yes College Other Very_High
5 No Master Life_Sciences Very_High
7 No Below_College Medical Low
8 No College Life_Sciences Very_High
Gender JobLevel JobSatisfaction MaritalStatus MonthlyIncome
1 Female 2 Very_High Single 5993
2 Male 2 Medium Married 5130
4 Male 1 High Single 2090
5 Female 1 High Married 2909
7 Male 1 Medium Married 3468
8 Male 1 Very_High Single 3068
Variables in DataSet, Examples continued
OverTime PercentSalaryHike RelationshipSatisfaction
1 Yes 11 Low
2 No 23 Very_High
4 Yes 15 Medium
TotalWorkingYears TrainingTimesLastYear WorkLifeBalance
1 8 0 Bad
2 10 3 Better
4 7 3 Better
YearsAtCompany YearsSinceLastPromotion YearsWithCurrManager
1 6 0 5
2 10 1 7
4 0 0 0
Correlation Plot 1: All Variables
Correlation Plot 2: All Variables
Identify Significant Predictors Using Random Forests
Random Forests
Confusion Matrix and Statistics
Reference
Prediction No Yes
No 1233 0
Yes 0 237
Accuracy : 1
95% CI : (0.9975, 1)
No Information Rate : 0.8388
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 1
Mcnemar's Test P-Value : NA
Sensitivity : 1.0000
Specificity : 1.0000
Pos Pred Value : 1.0000
Neg Pred Value : 1.0000
Prevalence : 0.8388
Detection Rate : 0.8388
Detection Prevalence : 0.8388
Balanced Accuracy : 1.0000
'Positive' Class : No
Monthly Income
Age
Logistic Regression
Confusion Matrix and Statistics
Reference
Prediction No Yes
No 1215 202
Yes 18 35
Accuracy : 0.8503
95% CI : (0.8311, 0.8682)
No Information Rate : 0.8388
P-Value [Acc > NIR] : 0.1203
Kappa : 0.1939
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.9854
Specificity : 0.1477
Pos Pred Value : 0.8574
Neg Pred Value : 0.6604
Prevalence : 0.8388
Detection Rate : 0.8265
Detection Prevalence : 0.9639
Balanced Accuracy : 0.5665
'Positive' Class : No
Age & Year at the Company
Monthly Income & Distance From Home
Naive Bayes
Confusion Matrix and Statistics
Reference
Prediction No Yes
No 1122 133
Yes 111 104
Accuracy : 0.834
95% CI : (0.814, 0.8527)
No Information Rate : 0.8388
P-Value [Acc > NIR] : 0.7046
Kappa : 0.3624
Mcnemar's Test P-Value : 0.1788
Sensitivity : 0.9100
Specificity : 0.4388
Pos Pred Value : 0.8940
Neg Pred Value : 0.4837
Prevalence : 0.8388
Detection Rate : 0.7633
Detection Prevalence : 0.8537
Balanced Accuracy : 0.6744
'Positive' Class : No
Comparing MOdels
Who is likely to leave?
Attrition MonthlyIncome Age OverTime DailyRate TotalWorkingYears
1933 Yes 9854 28 Yes 1475 6
410 No 16015 41 No 334 22
198 No 2720 30 No 1427 6
JobRole MonthlyRate HourlyRate DistanceFromHome
1933 Sales_Executive 23352 84 13
410 Manager 15896 88 2
198 Laboratory_Technician 11162 35 2
YearsAtCompany
1933 2
410 22
198 5
1933 410 198
Yes No No
Levels: No Yes